How do I deal with content scrapers? [closed]

Posted by aem on Pro Webmasters See other posts from Pro Webmasters or by aem
Published on 2012-04-04T02:45:47Z Indexed on 2012/04/04 11:41 UTC
Read the original article Hit count: 282

Filed under:
|
|

Possible Duplicate:
How to protect SHTML pages from crawlers/spiders/scrapers?

My Heroku (Bamboo) app has been getting a bunch of hits from a scraper identifying itself as GSLFBot. Googling for that name produces various results of people who've concluded that it doesn't respect robots.txt (eg, http://www.0sw.com/archives/96).

I'm considering updating my app to have a list of banned user-agents, and serving all requests from those user-agents a 400 or similar and adding GSLFBot to that list. Is that an effective technique, and if not what should I do instead?

(As a side note, it seems weird to have an abusive scraper with a distinctive user-agent.)

© Pro Webmasters or respective owner

Related posts about robots.txt

Related posts about spam